Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
Nat Commun ; 15(1): 3980, 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38730231

RESUMO

Schizophrenia is a complex neuropsychiatric disorder with sexually dimorphic features, including differential symptomatology, drug responsiveness, and male incidence rate. Prior large-scale transcriptome analyses for sex differences in schizophrenia have focused on the prefrontal cortex. Analyzing BrainSeq Consortium data (caudate nucleus: n = 399, dorsolateral prefrontal cortex: n = 377, and hippocampus: n = 394), we identified 831 unique genes that exhibit sex differences across brain regions, enriched for immune-related pathways. We observed X-chromosome dosage reduction in the hippocampus of male individuals with schizophrenia. Our sex interaction model revealed 148 junctions dysregulated in a sex-specific manner in schizophrenia. Sex-specific schizophrenia analysis identified dozens of differentially expressed genes, notably enriched in immune-related pathways. Finally, our sex-interacting expression quantitative trait loci analysis revealed 704 unique genes, nine associated with schizophrenia risk. These findings emphasize the importance of sex-informed analysis of sexually dimorphic traits, inform personalized therapeutic strategies in schizophrenia, and highlight the need for increased female samples for schizophrenia analyses.


Assuntos
Núcleo Caudado , Córtex Pré-Frontal Dorsolateral , Hipocampo , Locos de Características Quantitativas , Esquizofrenia , Caracteres Sexuais , Humanos , Esquizofrenia/genética , Esquizofrenia/metabolismo , Feminino , Masculino , Hipocampo/metabolismo , Núcleo Caudado/metabolismo , Córtex Pré-Frontal Dorsolateral/metabolismo , Adulto , Transcriptoma , Perfilação da Expressão Gênica , Fatores Sexuais , Cromossomos Humanos X/genética , Córtex Pré-Frontal/metabolismo
2.
bioRxiv ; 2023 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-37034760

RESUMO

Ancestral differences in genomic variation are determining factors in gene regulation; however, most gene expression studies have been limited to European ancestry samples or adjusted for ancestry to identify ancestry-independent associations. We instead examined the impact of genetic ancestry on gene expression and DNA methylation (DNAm) in admixed African/Black American neurotypical individuals to untangle effects of genetic and environmental factors. Ancestry-associated differentially expressed genes (DEGs), transcripts, and gene networks, while notably not implicating neurons, are enriched for genes related to immune response and vascular tissue and explain up to 26% of heritability for ischemic stroke, 27% of heritability for Parkinson's disease, and 30% of heritability for Alzhemier's disease. Ancestry-associated DEGs also show general enrichment for heritability of diverse immune-related traits but depletion for psychiatric-related traits. The cell-type enrichments and direction of effects vary by brain region. These DEGs are less evolutionarily constrained and are largely explained by genetic variations; roughly 15% are predicted by DNAm variation implicating environmental exposures. We also compared Black and White Americans, confirming most of these ancestry-associated DEGs. Our results highlight how environment and genetic background affect genetic ancestry differences in gene expression in the human brain and affect risk for brain illness.

3.
PLoS Comput Biol ; 18(6): e1009730, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35648784

RESUMO

Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate often leads to mis-identified splice sites. Here we present a new release of StringTie that performs hybrid-read assembly. By taking advantage of the strengths of both long and short reads, hybrid-read assembly with StringTie is more accurate than long-read only or short-read only assembly, and on some datasets it can more than double the number of correctly assembled transcripts, while obtaining substantially higher precision than the long-read data assembly alone. Here we demonstrate the improved accuracy on simulated data and real data from Arabidopsis thaliana, Mus musculus, and human. We also show that hybrid-read assembly is more accurate than correcting long reads prior to assembly while also being substantially faster. StringTie is freely available as open source software at https://github.com/gpertea/stringtie.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Transcriptoma , Algoritmos , Animais , Éxons , Humanos , Camundongos , Análise de Sequência de DNA , Análise de Sequência de RNA , Software , Transcriptoma/genética
4.
Bioinformatics ; 37(20): 3650-3651, 2021 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-33964128

RESUMO

SUMMARY: Although the ability to programmatically summarize and visually inspect sequencing data is an integral part of genome analysis, currently available methods are not capable of handling large numbers of samples. In particular, making a visual comparison of transcriptional landscapes between two sets of thousands of RNA-seq samples is limited by available computational resources, which can be overwhelmed due to the sheer size of the data. In this work, we present TieBrush, a software package designed to process very large sequencing datasets (RNA, whole-genome, exome, etc.) into a form that enables quick visual and computational inspection. TieBrush can also be used as a method for aggregating data for downstream computational analysis, and is compatible with most software tools that take aligned reads as input. AVAILABILITY AND IMPLEMENTATION: TieBrush is provided as a C++ package under the MIT License. Precompiled binaries, source code and example data are available on GitHub (https://github.com/alevar/tiebrush). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

5.
F1000Res ; 92020.
Artigo em Inglês | MEDLINE | ID: mdl-32489650

RESUMO

GTF (Gene Transfer Format) and GFF (General Feature Format) are popular file formats used by bioinformatics programs to represent and exchange information about various genomic features, such as gene and transcript locations and structure. GffRead and GffCompare are open source programs that provide extensive and efficient solutions to manipulate files in a GTF or GFF format. While GffRead can convert, sort, filter, transform, or cluster genomic features, GffCompare can be used to compare and merge different gene annotations. Availability and implementation: GFF utilities are implemented in C++ for Linux and OS X and released as open source under an MIT license  ( https://github.com/gpertea/gffread, https://github.com/gpertea/gffcompare).


Assuntos
Biologia Computacional , Genômica , Software , Genoma , Anotação de Sequência Molecular
6.
PLoS Genet ; 16(1): e1008571, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31986137

RESUMO

Long-read sequencing facilitates assembly of complex genomic regions. In plants, loci containing nucleotide-binding, leucine-rich repeat (NLR) disease resistance genes are an important example of such regions. NLR genes constitute one of the largest gene families in plants and are often clustered, evolving via duplication, contraction, and transposition. We recently mapped the Xo1 locus for resistance to bacterial blight and bacterial leaf streak, found in the American heirloom rice variety Carolina Gold Select, to a region that in the Nipponbare reference genome is NLR gene-rich. Here, toward identification of the Xo1 gene, we combined Nanopore and Illumina reads and generated a high-quality Carolina Gold Select genome assembly. We identified 529 complete or partial NLR genes and discovered, relative to Nipponbare, an expansion of NLR genes at the Xo1 locus. One of these has high sequence similarity to the cloned, functionally similar Xa1 gene. Both harbor an integrated zfBED domain, and the repeats within each protein are nearly perfect. Across diverse Oryzeae, we identified two sub-clades of NLR genes with these features, varying in the presence of the zfBED domain and the number of repeats. The Carolina Gold Select genome assembly also uncovered at the Xo1 locus a rice blast resistance gene and a gene encoding a polyphenol oxidase (PPO). PPO activity has been used as a marker for blast resistance at the locus in some varieties; however, the Carolina Gold Select sequence revealed a loss-of-function mutation in the PPO gene that breaks this association. Our results demonstrate that whole genome sequencing combining Nanopore and Illumina reads effectively resolves NLR gene loci. Our identification of an Xo1 candidate is an important step toward mechanistic characterization, including the role(s) of the zfBED domain. Finally, the Carolina Gold Select genome assembly will facilitate identification of other useful traits in this historically important variety.


Assuntos
Resistência à Doença , Proteínas NLR/genética , Oryza/genética , Proteínas de Plantas/genética , Anotação de Sequência Molecular , Proteínas NLR/química , Proteínas NLR/metabolismo , Sequenciamento por Nanoporos/métodos , Oryza/imunologia , Proteínas de Plantas/química , Proteínas de Plantas/metabolismo , Sequenciamento Completo do Genoma/métodos , Dedos de Zinco
7.
Genome Biol ; 20(1): 278, 2019 12 16.
Artigo em Inglês | MEDLINE | ID: mdl-31842956

RESUMO

RNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new methods to handle the high error rate of long reads and offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of short-read assemblies. StringTie2 is more accurate and faster and uses less memory than all comparable short-read and long-read analysis tools.


Assuntos
Técnicas Genéticas , Genômica/métodos , Transcriptoma , Animais , Arabidopsis , Humanos , Análise de Sequência de RNA , Software , Zea mays
8.
Artigo em Inglês | MEDLINE | ID: mdl-30373801

RESUMO

Standard antimicrobial susceptibility testing (AST) approaches lead to delays in the selection of optimal antimicrobial therapy. Here, we sought to determine the accuracy of antimicrobial resistance (AMR) determinants identified by Nanopore whole-genome sequencing in predicting AST results. Using a cohort of 40 clinical isolates (21 carbapenemase-producing carbapenem-resistant Klebsiella pneumoniae, 10 non-carbapenemase-producing carbapenem-resistant K. pneumoniae, and 9 carbapenem-susceptible K. pneumoniae isolates), three separate sequencing and analysis pipelines were performed, as follows: (i) a real-time Nanopore analysis approach identifying acquired AMR genes, (ii) an assembly-based Nanopore approach identifying acquired AMR genes and chromosomal mutations, and (iii) an approach using short-read correction of Nanopore assemblies. The short-read correction of Nanopore assemblies served as the reference standard to determine the accuracy of Nanopore sequencing results. With the real-time analysis approach, full annotation of acquired AMR genes occurred within 8 h from subcultured isolates. Assemblies sufficient for full resistance gene and single-nucleotide polymorphism annotation were available within 14 h from subcultured isolates. The overall agreement of genotypic results and anticipated AST results for the 40 K. pneumoniae isolates was 77% (range, 30% to 100%) and 92% (range, 80% to 100%) for the real-time approach and the assembly approach, respectively. Evaluating the patients contributing the 40 isolates, the real-time approach and assembly approach could shorten the median time to effective antibiotic therapy by 20 h and 26 h, respectively, compared to standard AST. Nanopore sequencing offers a rapid approach to both accurately identify resistance mechanisms and to predict AST results for K. pneumoniae isolates. Bioinformatics improvements enabling real-time alignment, coupled with rapid extraction and library preparation, will further enhance the accuracy and workflow of the Nanopore real-time approach.


Assuntos
Proteínas de Bactérias/genética , Farmacorresistência Bacteriana Múltipla/genética , Genoma Bacteriano , Klebsiella pneumoniae/genética , Fenótipo , Sequenciamento Completo do Genoma/métodos , beta-Lactamases/genética , Antibacterianos/metabolismo , Antibacterianos/farmacologia , Proteínas de Bactérias/metabolismo , Carbapenêmicos/metabolismo , Carbapenêmicos/farmacologia , Estudos de Coortes , Biologia Computacional/métodos , Expressão Gênica , Biblioteca Gênica , Humanos , Infecções por Klebsiella/tratamento farmacológico , Infecções por Klebsiella/microbiologia , Klebsiella pneumoniae/efeitos dos fármacos , Klebsiella pneumoniae/enzimologia , Klebsiella pneumoniae/isolamento & purificação , Testes de Sensibilidade Microbiana , Polimorfismo de Nucleotídeo Único , Sequenciamento Completo do Genoma/instrumentação , beta-Lactamases/metabolismo
9.
Genome Biol ; 19(1): 208, 2018 11 28.
Artigo em Inglês | MEDLINE | ID: mdl-30486838

RESUMO

We assembled the sequences from deep RNA sequencing experiments by the Genotype-Tissue Expression (GTEx) project, to create a new catalog of human genes and transcripts, called CHESS. The new database contains 42,611 genes, of which 20,352 are potentially protein-coding and 22,259 are noncoding, and a total of 323,258 transcripts. These include 224 novel protein-coding genes and 116,156 novel transcripts. We detected over 30 million additional transcripts at more than 650,000 genomic loci, nearly all of which are likely nonfunctional, revealing a heretofore unappreciated amount of transcriptional noise in human cells. The CHESS database is available at http://ccb.jhu.edu/chess .


Assuntos
Bases de Dados Genéticas , Análise de Sequência de RNA , Transcrição Gênica , Sequência de Aminoácidos , Animais , Feminino , Humanos , Íntrons , Masculino
10.
Nature ; 551(7681): 498-502, 2017 11 23.
Artigo em Inglês | MEDLINE | ID: mdl-29143815

RESUMO

Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat (Triticum aestivum, genomes AABBDD) and an important genetic resource for wheat. The large size and highly repetitive nature of the Ae. tauschii genome has until now precluded the development of a reference-quality genome sequence. Here we use an array of advanced technologies, including ordered-clone genome sequencing, whole-genome shotgun sequencing, and BioNano optical genome mapping, to generate a reference-quality genome sequence for Ae. tauschii ssp. strangulata accession AL8/78, which is closely related to the wheat D genome. We show that compared to other sequenced plant genomes, including a much larger conifer genome, the Ae. tauschii genome contains unprecedented amounts of very similar repeated sequences. Our genome comparisons reveal that the Ae. tauschii genome has a greater number of dispersed duplicated genes than other sequenced genomes and its chromosomes have been structurally evolving an order of magnitude faster than those of other grass genomes. The decay of colinearity with other grass genomes correlates with recombination rates along chromosomes. We propose that the vast amounts of very similar repeated sequences cause frequent errors in recombination and lead to gene duplications and structural chromosome changes that drive fast genome evolution.


Assuntos
Genoma de Planta , Filogenia , Poaceae/genética , Triticum/genética , Mapeamento Cromossômico , Diploide , Evolução Molecular , Duplicação Gênica , Genes de Plantas/genética , Genômica/normas , Poaceae/classificação , Recombinação Genética/genética , Análise de Sequência de DNA/normas , Triticum/classificação
11.
G3 (Bethesda) ; 7(11): 3831-3836, 2017 11 06.
Artigo em Inglês | MEDLINE | ID: mdl-28963165

RESUMO

Here we describe the sequencing and assembly of the pathogenic fungus Lomentospora prolificans using a combination of short, highly accurate Illumina reads and additional coverage in very long Oxford Nanopore reads. The resulting assembly is highly contiguous, containing a total of 37,627,092 bp with over 98% of the sequence in just 26 scaffolds. Annotation identified 8896 protein-coding genes. Pulsed-field gel analysis suggests that this organism contains at least 7 and possibly 11 chromosomes, the two longest of which have sizes corresponding closely to the sizes of the longest scaffolds, at 6.6 and 5.7 Mb.


Assuntos
Genoma Fúngico , Anotação de Sequência Molecular , Scedosporium/genética , Proteínas Fúngicas/genética , Sequenciamento Completo do Genoma
12.
G3 (Bethesda) ; 7(9): 3157-3167, 2017 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-28751502

RESUMO

A reference genome sequence for Pseudotsuga menziesii var. menziesii (Mirb.) Franco (Coastal Douglas-fir) is reported, thus providing a reference sequence for a third genus of the family Pinaceae. The contiguity and quality of the genome assembly far exceeds that of other conifer reference genome sequences (contig N50 = 44,136 bp and scaffold N50 = 340,704 bp). Incremental improvements in sequencing and assembly technologies are in part responsible for the higher quality reference genome, but it may also be due to a slightly lower exact repeat content in Douglas-fir vs. pine and spruce. Comparative genome annotation with angiosperm species reveals gene-family expansion and contraction in Douglas-fir and other conifers which may account for some of the major morphological and physiological differences between the two major plant groups. Notable differences in the size of the NDH-complex gene family and genes underlying the functional basis of shade tolerance/intolerance were observed. This reference genome sequence not only provides an important resource for Douglas-fir breeders and geneticists but also sheds additional light on the evolutionary processes that have led to the divergence of modern angiosperms from the more ancient gymnosperms.


Assuntos
Genoma de Planta , Fotossíntese/genética , Pinaceae/genética , Pinaceae/metabolismo , Pseudotsuga/genética , Pseudotsuga/metabolismo , Sequenciamento Completo do Genoma , Adaptação Biológica/genética , Biologia Computacional , Evolução Molecular , Duplicação Gênica , Redes Reguladoras de Genes , Genômica , Anotação de Sequência Molecular , Família Multigênica , Filogenia , Pinaceae/classificação , Proteômica/métodos , Pseudotsuga/classificação , Sequências Repetitivas de Ácido Nucleico
13.
Nat Protoc ; 11(9): 1650-67, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27560171

RESUMO

High-throughput sequencing of mRNA (RNA-seq) has become the standard method for measuring and comparing the levels of gene expression in a wide variety of species and conditions. RNA-seq experiments generate very large, complex data sets that demand fast, accurate and flexible software to reduce the raw read data to comprehensible results. HISAT (hierarchical indexing for spliced alignment of transcripts), StringTie and Ballgown are free, open-source software tools for comprehensive analysis of RNA-seq experiments. Together, they allow scientists to align reads to a genome, assemble transcripts including novel splice variants, compute the abundance of these transcripts in each sample and compare experiments to identify differentially expressed genes and transcripts. This protocol describes all the steps necessary to process a large set of raw sequencing reads and create lists of gene transcripts, expression levels, and differentially expressed genes and transcripts. The protocol's execution time depends on the computing resources, but it typically takes under 45 min of computer time. HISAT, StringTie and Ballgown are available from http://ccb.jhu.edu/software.shtml.


Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Software , Estatística como Assunto/métodos , Anotação de Sequência Molecular , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Interface Usuário-Computador
15.
Nat Biotechnol ; 33(3): 290-5, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25690850

RESUMO

Methods used to sequence the transcriptome often produce more than 200 million short sequences. We introduce StringTie, a computational method that applies a network flow algorithm originally developed in optimization theory, together with optional de novo assembly, to assemble these complex data sets into transcripts. When used to analyze both simulated and real data sets, StringTie produces more complete and accurate reconstructions of genes and better estimates of expression levels, compared with other leading transcript assembly programs including Cufflinks, IsoLasso, Scripture and Traph. For example, on 90 million reads from human blood, StringTie correctly assembled 10,990 transcripts, whereas the next best assembly was of 7,187 transcripts by Cufflinks, which is a 53% increase in transcripts assembled. On a simulated data set, StringTie correctly assembled 7,559 transcripts, which is 20% more than the 6,310 assembled by Cufflinks. As well as producing a more complete transcriptome assembly, StringTie runs faster on all data sets tested to date compared with other assembly software, including Cufflinks.


Assuntos
Análise de Sequência de RNA/métodos , Software , Transcriptoma/genética , Algoritmos , Células HEK293 , Humanos , RNA Mensageiro/genética , RNA Mensageiro/metabolismo
16.
Genome Biol ; 14(4): R36, 2013 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-23618408

RESUMO

TopHat is a popular spliced aligner for RNA-sequence (RNA-seq) experiments. In this paper, we describe TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which can occur after genomic translocations. TopHat2 combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes. TopHat2 is available at http://ccb.jhu.edu/software/tophat.


Assuntos
Duplicação Gênica , Fusão Gênica , Mutagênese Insercional , Alinhamento de Sequência/métodos , Software , Humanos , Sensibilidade e Especificidade , Análise de Sequência de RNA/métodos , Transcriptoma
17.
Nat Protoc ; 7(3): 562-78, 2012 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-22383036

RESUMO

Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.


Assuntos
DNA Complementar/genética , Perfilação da Expressão Gênica/métodos , Estudos de Associação Genética/métodos , Genômica/métodos , Análise de Sequência de DNA/métodos , Software
18.
BMC Bioinformatics ; 12: 274, 2011 Jul 04.
Artigo em Inglês | MEDLINE | ID: mdl-21726447

RESUMO

BACKGROUND: Comparison of the human genome with other primates offers the opportunity to detect evolutionary events that created the diverse phenotypes among the primate species. Because the primate genomes are highly similar to one another, methods developed for analysis of more divergent species do not always detect signs of evolutionary selection. RESULTS: We have developed a new method, called DivE, specifically designed to find regions that have evolved either more or less rapidly than expected, for any clade within a set of very closely related species. Unlike some previous methods, DivE does not rely on rates of synonymous and nonsynonymous substitution, which enables it to detect evolutionary events in noncoding regions. We demonstrate using simulated data that DivE compares favorably to alternative methods, and we then apply DivE to the ENCODE regions in 14 primate species. We identify thousands of regions in these primates, ranging from 50 to >10000 bp in length, that appear to have experienced either constrained or accelerated rates of evolution. In particular, we detected 4942 regions that have potentially undergone positive selection in one or more primate species. Most of these regions occur outside of protein-coding genes, although we identified 20 proteins that have experienced positive selection. CONCLUSIONS: DivE provides an easy-to-use method to predict both positive and negative selection in noncoding DNA, that is particularly well-suited to detecting lineage-specific selection in large genomes.


Assuntos
Filogenia , Primatas/genética , Software , Animais , Evolução Biológica , Genoma , Genoma Humano , Humanos
19.
PLoS Biol ; 8(9)2010 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-20838655

RESUMO

A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.


Assuntos
Genoma , Perus/genética , Animais , Sequência de Bases , Mapeamento Cromossômico , DNA/genética , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA , Homologia de Sequência do Ácido Nucleico , Especificidade da Espécie
20.
Nat Biotechnol ; 28(5): 511-5, 2010 May.
Artigo em Inglês | MEDLINE | ID: mdl-20436464

RESUMO

High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation. However, this would require algorithms that are not restricted by prior gene annotations and that account for alternative transcription and splicing. Here we introduce such algorithms in an open-source software program called Cufflinks. To test Cufflinks, we sequenced and analyzed >430 million paired 75-bp RNA-Seq reads from a mouse myoblast cell line over a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Over the time series, 330 genes showed complete switches in the dominant transcription start site (TSS) or splice isoform, and we observed more subtle shifts in 1,304 other genes. These results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.


Assuntos
Diferenciação Celular/genética , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Isoformas de Proteínas/genética , RNA Mensageiro/análise , Análise de Sequência de RNA/métodos , Algoritmos , Animais , Linhagem Celular , Genoma , Camundongos , Isoformas de Proteínas/metabolismo , Proteínas Proto-Oncogênicas c-myc/genética , Proteínas Proto-Oncogênicas c-myc/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...